Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: Spec of finality vote submission #120

Merged
merged 7 commits into from
Nov 19, 2024
Merged

Conversation

gitferry
Copy link
Member

@gitferry gitferry commented Nov 7, 2024

No description provided.

Copy link
Member

@Lazar955 Lazar955 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concise write up!


#### Batch submission

A batch submission mechanism is needed to deal with cases where:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe note that there is a synchronization mechanism between loops so that we don't submit from fast sync loop and regular one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current code, all the submission is done in the submission loop. So there's no race condition. The fast sync loop is for sending the lagging signal. If lagging is detected, the sumission loop will do fast sync. See

case targetBlock := <-fp.laggingTargetChan:
.

Nevertheless, in the spec, we use batch submission other than fast sync and we plan to remove the latter completely.


## Internal State

The finality provider maintains two critical persistent states:
Copy link
Collaborator

@bap2pecs bap2pecs Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's simpler to just keep one persistent state in DB (i.e. last processed height)

last voted height can be easily retrieved from remote

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually do you think we can get rid of both from DB and only keeps one in-memory state (i.e. fp.nextHeightToProcess)?

I feel it's doable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think we maybe can get rid of lastVotedHeight but why do you say it's easy to retrieve lastVotedHeight from the remote? It seems that we need to save this info in the finality provider on Babylon

actually do you think we can get rid of both from DB and only keeps one in-memory state (i.e. fp.nextHeightToProcess)?

if we get rid of fp.nextHeightToProcess, we will waste some resources processing blocks that have already been processed, won't we?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why do you say it's easy to retrieve lastVotedHeight from the remote? It seems that we need to save this info in the finality provider on Babylon

b/c if we store it in a local DB, there will be two source of truth. for the OP consumer FP, we can easily add a query in the CosmWasm contract for it e.g. getFpLastVotedHeight

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we get rid of fp.nextHeightToProcess, we will waste some resources processing blocks that have already been processed, won't we?

no I mean to get rid of the local DB states but keep nextHeightToProcess

how it works is: at startup, the FP will get everything it needs from the Babylon Chain and/or the CosmWasm contract to set nextHeightToProcess

but nextHeightToProcess is not a DB state, it's an in-memory field value of the FinalityProvider object

Copy link
Collaborator

@bap2pecs bap2pecs Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presisting lastProcessedHeight means engineers will need to have it in their mental model to make sure this data is updated and think about edge cases where it can go out of sync. and also need to think about when and where to compare with lastVotedHeight and what the results imply.

that's why I think it's not worth it if the edge case discussed will almost never gonna happen in the real world. and even if it happens, the impact is negligible.

but I don't have a strong preference either. will leave it to your call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will bring it to the team to make the decision. Appreciate the discussion!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second thought, I feel that we can get rid of lastProcessedHeight completely (leaving unharmful edge case open) but we probably need to keep the lastVotedHeight both in db and memory. If we only have it in memory, there would be an attack I can think of:

Consider a fp is crashed right after sending a vote for height 100. After restart, it syncs with the remote and initialize the last voted height. However, the last voted height of the fp from the remote could be still 99 due to the late excution of the block. Therefore, it is possible for the fp to send another vote for 100. This puts the fp under the risk of being slashed due to signing the same height twice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the last voted height of the fp from the remote could be still 99 due to the late excution of the block.

do u mean the babylon chain didn't execute the block containing the vote in time?

but as long as the FP signs the same block at height 100, it won't be slashed.

the assumption is the FP is connected to a trusted RPC that won't send forks to it.

but i guess it's probably safer to have some redundancy considering FP is crucial

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I don't think it's safe to assume that FP is connected to a trusted RPC as we can not control it. The bottom line is that fp should not sign twice for the same height.

Comment on lines 33 to 34
2. If no finalized blocks exist:
- Start from `finalityActivationHeight`
Copy link
Collaborator

@bap2pecs bap2pecs Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a problem here.

imagine there is no finalized block, but this FP already voted for the first activated block and it's just waiting for the rest FPs to vote so the block can get finalized

starting at finalityActivationHeight means a duplicate vote

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. See updates in ab8913a

Tracks the most recent height for which a finality vote was successfully
submitted. This prevents duplicate voting on previously voted heights.

### Last processed height
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between last voted height and last processed height?

Copy link
Member Author

@gitferry gitferry Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last processed height additionally covers the case where the fp does not have voting power

4. Synchronize local state:
- Ensure local `lastVotedHeight` and `lastProcessedHeight` ≥ `lastVotedHeightRemote`
5. Begin processing at:
- `max(lastProcessedHeight + 1, lastFinalizedHeight + 1, finalityActivationHeight)`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things:

  • lastProcessedHeight is not initialised anywhere. I assume it comes from persistent state. Lets add the step of retrievieng it from database
  • in some consumer chains, it maybe possible to vote and receive reward even after finality is reached, so starting from lastFinalizedHeight + 1 may leave some rewards on the table. I would maybe leave it as comment here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastProcessedHeight is not initialised anywhere. I assume it comes from persistent state. Lets add the step of retrievieng it from database

In the discussion here I think we could probably be able to remove lastProcessedHeight as it is not critical to security.

in some consumer chains, it maybe possible to vote and receive reward even after finality is reached, so starting from lastFinalizedHeight + 1 may leave some rewards on the table. I would maybe leave it as comment here.

Good point. Fixed in 8732f15

### Normal Submission Loop

After the finality provider is bootstraped, it continuously monitors for
new blocks. For each new block, it performs these validation checks:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is worth mentioning that finality provider receives block from trusted full node of consumer chain (or at least light client ) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed in 8732f15

@bap2pecs
Copy link
Collaborator

how should we handle these two cases:

  1. pub rand is lagging behind a lot to the tip height
  2. FP is currently in jail or just get out of jail

@gitferry
Copy link
Member Author

pub rand is lagging behind a lot to the tip height

When a fp is started, a sperate routine for committing public randomness should be started. This part if not the cope of this spec.

FP is currently in jail or just get out of jail

In this case, the operator should get the fp unjailed first. Will have a separate doc for this

@bap2pecs
Copy link
Collaborator

bap2pecs commented Nov 18, 2024

When a fp is started, a sperate routine for committing public randomness should be started. This part if not the cope of this spec.

but the vote submission routine has to wait for the PR is timestamped, so it's still relevant?

@gitferry
Copy link
Member Author

but the vote submission routine has to wait for the PR is timestamped, so it's still relevant?

If the PR for a block is not timestamped, then the voting power is 0 for good. Even though the PR is timestamped later, the result won't change

@bap2pecs
Copy link
Collaborator

If the PR for a block is not timestamped, then the voting power is 0 for good

I am not aware of this. shouldn't vp only be determined by the btc delegations?

@gitferry
Copy link
Member Author

I am not aware of this. shouldn't vp only be determined by the btc delegations?

This condition was added ffter integration with BTC-timestamping, see the figure here: a fp will only have voting power if (1) it has top N delegations, and (2) it has timestamped randomness for the block

@bap2pecs
Copy link
Collaborator

(1) it has top N delegations

what does this mean?

@bap2pecs
Copy link
Collaborator

This condition was added ffter integration with BTC-timestamping, see the figure here: a fp will only have voting power if (1) it has top N delegations, and (2) it has timestamped randomness for the block

this is good to know. but the important implication is when we have PR lagging for whatever reason (the 4 scenarios as I described in #131), it's possible that the entire consumer chain's total VP becomes zero for some range of blocks. so FPs will skip voting for those blocks. I guess there is nothing much we can do in such an extreme case, but it is worth pointing out. cc @SebastianElvis

@gitferry
Copy link
Member Author

this is good to know. but the important implication is when we have PR lagging for whatever reason (the 4 scenarios as I described in #131), it's possible that the entire consumer chain's total VP becomes zero for some range of blocks. so FPs will skip voting for those blocks. I guess there is nothing much we can do in such an extreme case, but it is worth pointing out. cc @SebastianElvis

Yes, this is an extreme case but valid. In this case, we would have a period where no blocks will be finalized until at least one fp has voting power and sends the vote

@gitferry gitferry merged commit a5f022f into main Nov 19, 2024
12 checks passed
@gitferry gitferry deleted the gai/spec-finality-vote branch November 19, 2024 02:08
gitferry added a commit that referenced this pull request Nov 27, 2024
The `lastProcessedHeight` was introduced for fast catch up blocks for
which the fp does not have voting power. However, this case rarely
happens and is unharmful. More discussions can be found
[here](#120 (comment)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants